Active Learning with Strong and Weak Views: A Case Study on Wrapper Induction
نویسندگان
چکیده
Multi-view learners reduce the need for labeled data by exploiting disjoint sub-sets of features (views), each of which is sufficient for learning. Such algorithms assume that each view is a strong view (i.e., perfect learning is possible in each view). We extend the multi-view framework by introducing a novel algorithm, Aggressive Co-Testing, that exploits both strong and weak views; in a weak view, one can learn a concept that is strictly more general or specific than the target concept. Aggressive Co-Testing uses the weak views both for detecting the most informative examples in the domain and for improving the accuracy of the predictions. In a case study on 33 wrapper induction tasks, our algorithm requires significantly fewer labeled examples than existing state-of-the-art approaches.
منابع مشابه
View Validation: A Case Study for Wrapper Induction and Text Classification
Wrapper induction algorithms, which use labeled examples to learn extraction rules, are a crucial component of information agents that integrate semi-structured information sources. Multi-view wrapper induction algorithms reduce the amount of training data by exploiting several types of rules (i.e., views), each of which being sufficient to extract the relevant data. All multiview algorithms re...
متن کاملActive Learning with Multiple Views
Active learners alleviate the burden of labeling large amounts of data by detecting and asking the user to label only the most informative examples in the domain. We focus here on active learning for multi-view domains, in which there are several disjoint subsets of features (views), each of which is sufficient to learn the target concept. In this paper we make several contributions. First, we ...
متن کاملSelective Sampling with Redundant Views
Selective sampling, a form of active learning, reduces the cost of labeling training data by asking only for the labels of the most informative unlabeled examples. We introduce a novel approach to selective sampling which we call co-testing. Cotesting can be applied to problems with redundant views (i.e., problems with multiple disjoint sets of attributes that can be used for learning). We anal...
متن کاملSelective Sampling With Naive Co-Testing: Preliminary Results
Selective sampling, a form of active learning, reduces the cost of labeling training data by asking only for the labels of the most informative unlabeled examples. We introduce a novel approach to selective sampling which we call co-testing. Co-testing can be applied to problems with redundant views (i.e., problems with multiple disjoint sets of attributes that can be used for learning). We ana...
متن کاملPopulating Ontologies with Data from OCRed Lists
A flexible, accurate, and efficient method of automatically extracting facts from lists in OCRed documents and inserting them into an ontology would help make those facts machine searchable, queryable, and linkable and expose their rich ontological interrelationships. To work well, such a process must be adaptable to variations in list format, tolerant of OCR errors, and careful in its selectio...
متن کامل